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PREFACE 


The  interpretation  and  significance  of  test  results  is  a  common  subject  of 
contention  in  test  programs.  For  example,  Initial  Operational  Test  and 
Evaluation  (IOT&E)  personnel  often  debate  how  particular  test  results  should 
affect  the  operational  effectiveness  or  suitability  assessment.  In  this 
debate,  the  battle  lines  are  often  drawn  by  organization — the  program  office  on 
one  side,  the  test  community  on  the  other.  Sometimes,  using  the  same  test 
data,  engineers  from  the  two  organizations  reach  radically  different  con¬ 
clusions.  Vhich  group  would  be  proved  right  when  the  weapon  was  used  ir  the 
field?  The  question  of  how  test  assessments  stack  up  against  a  weapon’s  later 
operational  performance  has  far-reaching  implications. 

Today,  IOT&E  assessments  play  a  critical  role  in  acquisition  decision 
making.  Given  their  importance,  how  accurate  are  these  assessments?  For 
example,  did  testers  accurately  predict  maintainability  (fuel  leak)  problems 
with  the  B-1B  or  effectiveness  deficiencies  with  the  Division  Air  Defense 
(DIVAD)  gun?  For  operational  weapons  systems,  it  might  be  possible  to  check 
assessment  accuracy  by  comparing  the  IOT&E  assessments  against  actual  perfor¬ 
mance  data  gathered  in  the  field.  Such  a  comparison  could  reveal  whether  IOT&E 
assessments  were  ultimately  right  or  wrong — feedback  that  should  have  all  sorts 
of  valuable  applications.  For  example,  diverse  IOT&E  programs  could  be  rated 
for  assessment  accuracy  and  compared,  testing  methods  improved,  and  critics 
silenced.  Given  that  checking  IOT&E  assessments  against,  operational  data 
seemed  to  be  common  sense,  was  somebody  was  already  doing  it? 

After  checking  with  the  Air  Force  Operational  Test  and  Evaluation  Center 
(AFOTEC)  and  the  OSD  office  for  OT&E,  it  became  clear  that  IOT&E  assessments 
are  never  compared  to  a  weapon's  later  operational  performance. (21: — ;  22: — > 
Nobody  ever  looks  back  at  the  IOT&E  results  to  check  accuracy.  This  report  is 
an  attempt  to  fill  this  void  with  a  feedback  tool  for  IOT&E  called  the  Opera¬ 
tional  Testing  Effectiveness  Evaluation  Method  (OTEEM) .  This  is  virgin  ground 
and  the  work  in  this  report  is  really  only  a  starting  point.  Changes  will 
undoubtedly  be  made,  but  the  idea  is  to  get  the  ball  rolling  toward  eventual 
implementation  of  thi3  potentially  valuable  idea. 

Special  thanks  go  to  my  advisor,  Maj  Larry  Pulcher,  who  helped  me  achieve 
some  degree  of  coherence  in  this  paper. 
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EXECUTIVE  SUMMARY 


“insights  into  tomorrow” 


Part  of  our  College  mission  is  distribution  of  the 
students’  problem  solving  products  to  DoD 
sponsors  and  other  interested  agencies  to 
enhance  insight  into  contemporary,  defense 
related  issues.  While  the  College  has  accepted  this 
product  as  meeting  academic  requirements  for 
graduation,  the  views  and  opinions  expressed  or 
implied  are  solely  those  of  the  author  and  should 
not  be  construed  as  carrying  official  sanction. 


REPORT  NUMBER  88-2090 
AUTHOR(S)  MAJOR  WILHELM  F.  PERCIVAL 

TITLE  THE  OPERATIONAL  TESTING  EFFECTIVENESS  EVALUATION  METHOD 

!,  Problem:  Operational  Test  and  Evaluation  (OT&E)  plays  a  vital  role  in 
weapons  system  acquisition.  Decision  makers  rely  on  Initial  Operational  Test 
and  Evaluation  (IOT&E)  effectiveness  and  suitability  assessments  when  making 
acquisition  decisions.  The  problem  is  that  there  is  currently  no  attempt  to 
check  the  accuracy  or  adequacy  of  these  OT&E  assessments — in  short,  no  way  to 
evaluate  OT&E  ef fectivenecs. 

II.  Objectives:  The  objective  of  this  report  is  to  support  the  need  for 
feedback  in  operational  testing  and  Introduce  a  technique  designed  to  measure 
iOT&E  effectiveness.  This  proposed  technique  is  called  the  Operational  Testing 
Effectiveness  Evaluation  Method  (OTEEM).  The  report  will  show  that  CTEEM 
should  be  implemented. 

III.  Discussion:  The  first  two  chapters  of  the  report  support  the  need  for 
OTEEM.  Chapter  One  shows  how  critics  have  disputed  the  adequacy  and  accuracy 
of  testing  in  such  systems  as  the  B-1B,  the  Division  Air  Defense  Gun,  and  the 
Advanced  Medium- Range  Air-to-Air  Missile.  These  disputes  highlight  the  need 
for  an  objective  evaluation  of  operational  testing  effectiveness.  The  second 
chapter  reviews  the  history  of  OT&B,  discussing  the  possibility  that  the  record 
of  frequent  convulsive  organizational  change  is  related  to  the  lack  of 
adequate  feedback  on  OT&B  effectiveness.  Furthernore,  without  objective 
feedback,  today's  managers  may  continue  the  historical  pattern  of  ineffective 


CONTINUED 


change.  The  third  chapter  lays  the  foundation  for  QTEEM  by  reviewing  the 
present  mission  of  OT&E  and  the  challenges  QTAE  personnel  must  face.  The 
chapter  is  intended  for  those  unfamiliar  with  QT&E.  Chapter  Four  is  the  crux 
of  the  report,  as  it  introduces  and  applies  QTEEK  to  the  Air  Launched  Cruise 
Missile  (ALCM).  OTEEM  relies  on  a  comparison  of  the  I0T4E  assessments  made  in 
♦he  IOT&E  final  report  and  the  results  of  field  testing  summarized  in  the  FOT&E 
Ph.vse  One  fina'  report.  The  method  compares  the  areas  of  operational  effec¬ 
tiveness,  suitability,  critical  issue  assessment,  and  deficiency  reporting. 
Survivability  is  mentioned  as  an  OTEEM  assessment  area,  but  is  not  Included  in 
the  example  due  to  classification.  The  ALCM  example  serves  to  illustrate  the 
OTEEM  technique  and  suggest  improvements  or  problems.  Chapter  Five  discusses 
several  findings,  including  problems  and  concerns  raised  by  the  application 
example. 

IV.  Findings:  The  application  exercise  shows  that  OTEEM  is  capable  of 
uncovering  problems  in  IOT&E,  The  in-depth  OTEEM  analysis  of  a  test  program 
provides  valuable  insights  for  the  OTAE  manager.  In  addition  to  scrutinizing 
individual  programs,  the  manager  would  be  able  to  summarize  and  compare 
numerous  test  programs  to  assess  broad  trends  in  operational  testing.  Other 
benefits  would  include  the  fine-tuning  of  effectiveness  and  suitability 
forecasting  techniques  and  the  identification  of  common  pitfalls  for  OT&E 
managers  to  avoid.  Finally,  several  minor  improvements  in  final  report  format 
or  approach  would  facilitate  OTEEM  application.  Overall,  OTEEM  seems  to  offer 
significant  benefit? — including  increased  confidence  in  OTAE  assessments — for 
minimal  cost. 

V.  Recommendation:  The  Air  Force  Operational  Test  and  Evaluation  Center 
(AFOTEC)  snould  begin  a  trial  OTEEM  application  program.  After  this  trial 
period,  a  finalized  form  of  OTEEM  should  be  implemented. 
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CHAPTER  ONE 


THE  NEED 

So  in  war,  through  the  influence  jf  innumerable  trifling  circum¬ 
stances,  which  on  paper  cannot  properly  be  taken  into  consideration, 
everything  depresses  us  and  we  come  far  short  of  our  mark. (19:45) 

-  Clausewitz 

Our  weapons  tests  now  use  so  much  computer  modeling  and  simulation 
that  no  one  knows  whether  some  new  arms  really  work.  <4:50) 

-  Discover  Magazine 

The  debate  rages  in  the  press,  in  technical  Journals,  in  the  halls  of  the 
Pentagon  and  Congress,  and  in  the  crew  lounges  of  operational  squadrons.  Vi 11 
the  new  high-tech  weapons  work  in  combat,  or  even  in  peacetime?  Before  it  buys 
these  weapons,  the  Air  Force  tests  them  to  answer  that  question.  Therefore, 
poor  weapon  effectiveness,  if  it  axists,  can  be  intimately  linked  to  poor 
testing  effectiveness.  Currently,  the  Air  Force  ha3  no  way  of  objectively  and 
routinely  Judging  the  effectiveness  of  weapons  system  testing.  This  report  is 
about  a  method  designed  to  provide  objective  feedback  on  the  effectiveness  of 
Initial  Operational  Test  and  Evaluation  (IOT&E).  The  first  two  chapters 
establish  the  need  for  this  method.  Chapters  Three  and  Four  develop  the  method 
and  demonstrate  its  use.  The  final  two  chapters  examine  the  issues  raised  by 
earlier  chapters,  summarize  the  report,  and  recommend  action. 

As  the  title  states,  this  chapter  is  about  the  need.  In  an  Air  Force 
where  every  conceivable  performance  dimension  is  measured,  it  seems  odd  to 
argue  for  rare  feedback.  However,  IOT&E  is  an  area  where  objective  feedback  is 
critically  needed.  To  appreciate  why  the  Air  Force  needs  to  evaluate  the 
effectiveness  of  operational  testing,  it  helps  to  review  the  official  purpose  ; 
of  tesl  and  evaluation  and  then  contrast  its  utopian  wording  with  some  short  1 
examples  of  real-world  controversy.  I 

Over  the  years,  the  Air  Force  established  test  and  evaluation  procedures 
to  find  out  if  weapons  work.  According  to  Air  Force  Regulation  80-14,  the 
purpose  of  all  test  and  evaluation  is:  ”to  identify,  assess,  and  reduce  the 
acquisition  risks;  to  evaluate  operational  effectiveness  and  operational 
suitability;  to  identify  any  deficiencies  in  the  system;  and  to  ensure  that 
only  operationally  effective  and  suitable,  supportable  systems  are  delivered  to 
the  operating  forces.” (17:2)  In  other  words,  testing  determines  if  weapons 
work  as  advertised  and  forecasts  their  effectiveness  on  the  battlefield. 
Furthermore,  testing  ensures  that  onlv  effective  and  suitable  weapons  make  it 
to  the  ramp.  Sounds  easy,  but  as  the  following  examples  imply,  the  testing  Job 
is  much  more  difficult  than  it  appears. 


The  nation’s  newest  strategic  bomber,  the  B-1B,  is  flying  through  a  storm 
of  controversy  surrounding  its  operational  capabilities.  In  recent  months,  the 
aircraft  has  received  negative  press  on  problems  ranging  from  fuel  leaks  to 
faulty  defensive  avionics. (10: — )  B-l  supporters  contend  the  aircraft  is  Just 
experiencing  "routine”  difficulties;  nothing  to  be  alarmed  about. (9: — ) 

However,  with  articles  like  "The  B-l  Bomber:  A  Flying  Lemon?"  spreading  alarm 
seems  to  be  the  media's  goal. (9: — )  The  Air  Force  Chief  of  Staff,  responding 
to  the  feeding-frenzy  atmosphere  generated  bv  B-l  critics,  has  complained  about 
.  .  hypercritical  reports  in  the  media,  even  in  such  level-headed  places  as 
Texas. "(3: — )  Meanwhile,  testifying  to  congressional  subcommittees,  "Gen. 
Lawrence  Welch  admitted  that  the  Air  Force  failed  to  adequately  test  major  B-1B 
subsystems  before  they  were  integrated  into  the  aircraft. " r8: 264)  Quostions  of 
adequacy  and  objectivity  have  also  dogged  other  Department  cf  Defense  (DoD) 
test  programs. 

One  such  program,  the  Division  Air  Defense  Gun  (DIVAD),  is  significant 
because  the  DoD  directives  governing  the  Amu's  DIVAD  testing  also  govern  Air 
Force  testing.  DIVAD,  or  Sgt  York,  was  "...  the  first  major  weapons  system 
to  be  scrubbed  in  eight  years  and  the  first  in  decades  to  be  canceled  so  far 
into  production. " (6: When  the  system  was  canceled,  a  significant  amount  of 
testing  had  already  been  performed.  According  tc  the  DoD  directives  cui . ent  at 
the  time  entry  into  Full  Scale  Development  (FSD)  required  "adequate"  develop¬ 
mental  a«.i  operational  testing  to  identify  risks,  "feasible  solutions,"  and 
"estimate  the  potential  operational  effectiveness  .  .  ."(10:13-14)  Unfor¬ 
tunately,  some  of  this  testing  was  apparently  rigged  in  DIVAD' s  favor. (4:56) 
However,  the  subsequent  operational  testing  required  for  Low  Rate  Initial 
Production  (LRIP)  finally  and  conclusively  showed  the  weapon  was  a  flop.  I7:-44) 
Therefore,  after  experiencing  initial  difficulties,  IOT&E  successfully  revealed 
DIVAD' s  problems. 

The  senior  executive  charged  with  the  operational  evaluation  of  new 
weapons,  OSD’s  Director  of  Operational  Test  and  Evaluation  (DOTE),  proved  the 
value  of  independent  0T4E  by  blowing  the  whistle  on  DIVAD.  But  even  then,  he 
had  to  respond  to  accusations  of  soft-pedalling  DIVAD  problems. (7: 46)  Appar¬ 
ently,  some  politicians  doubted  DOTE's  objectivity.  Writing  in  1986,  Senator 
Gary  Hart  said:  'It  CDOTEI  is  playing  the  same  'go  along  to  get  along,  keep 
everybody  happy  by  keeping  the  money  flowing'  game  that  has  too  often  under¬ 
mined  past  operational  testing  and  effective  weapons. " (7: 42)  In  the  end,  Sgt 
York  cost  1.8  billion  dollars  and,  according  to  some,  another  black  eye  for 
weapons  acquisition  and  testing. (4:56)  Certain  critics  think  the  Advanced 
Medium-Range  Air-to-Air  Missile  (AMRAAM)  could  be  ar.otner  DIVAD.  (7:46) 

The  AMRAAM  was  certified  for  LRIP  on  28  Feb  86,  despite  DOTE  memoranda 
warning  there  was  a  "lew  probability  of  adequate  test  results"  being  obtained 
prior  to  certification. (7: 46)  Simultaneously,  a  General  Accounting  Office 
report  critical  of  the  missile  added  fuel  to  the  fire  of  AMRAAM  critics  on 
Capitol  Hill. (7:46)  Concerning  DOTE's  credibility  on  this  issue,  one  House 
aide  quipped:  "We  have  fire  and  storm  emanating  from  memos,  but  when  it  comes 
to  making  a  really  tough  decision,  the  lion  becomes  a  mouse. " (7: 46)  Taken 
together,  the  AMRAAM,  DIVAD,  and  B-l  controvers' es  raise  urgent  questions  about 
the  effectiveness,  accuracy,  and  adequacy  cf  10T&E. 


Are  IOT&E  assessments  effective,  accurate,  and  adequate?  Unfortunately, 
the  Air  Force  is  ill-equipped  to  answer  this  question.  Currently,  there  is  no 
formal  review  of  IOT&E  assessments  in  light  of  later  operational  experience 
with  a  weapon — no  procedure  for  checking  IQT&E  predictions  against  reality. 
Instead,  weapons  testing  is  challenged  and  defended  in  an  emotionally  charged 
atmosphere  with  little  objective  data;  a  situation  not  conducive  to  unbiased 
evaluation.  Emotionalism  and  polemics  are  not  good  ways  to  Judge  testing,  and 
some  kind  of  objective  evaluation  is  important  to  ensure  IOT&E  is  doing  the 
Job. 

An  objective  evaluation  of  IOT&E  could  produce  several  benefits.  Con¬ 
ceivably,  it  would  highlight  testing  problem  areas  and  guide  changes  in 
organization  or  technique  to  solve  them.  Furthermore,  implementation  of  an 
objective  evaluation  system  would  show  critics  that  OT&E  management  is  effec¬ 
tive  and  concerned  with  improvement.  Evaluation  also  has  the  potential  to 
improve  the  credibility  of  IOT&E  assessments.  At  the  very  least,  an  evaluation 
of  the  operational  testing  conducted  for  each  new  weapon  system  would  provide  a 
feedback  step  currently  missing  in  the  acquisition  process — a  step  obviously 
required  for  any  hope  of  future  IOT&E  improvement.  After  all,  it’s  difficult 
to  improve  if  current  IOT&E  performance  is  unknown.  The  unknown  accuracy  and 
adequacy  of  IOT&E  contributes  to  the  weapons  acquisition  controversies  men¬ 
tioned  earlier  in  the  chapter. 

The  chapter  began  with  a  common  concern:  Will  the  new  weapons  work?  The 
question  is  clearly  related  to  Veapons  testing  and  the  fact  that  no  way  exists 
to  objectively  measure  testing  effectiveness.  Proposing  a  way  to  evaluate 
IOT&E  effectiveness  is  the  purpose  of  this  report.  The  rest  of. the  chapter 
elaborated  on  the  need  for  this  evaluation  method.  It  contrasted  the  official 
purpose  of  testing  with  the  real-world  problems  of  the  B-l,  DIVAD,  and  AMRAAM. 
The  debate  over  the  performance  of  these  weapons  is  reason  enough  to  examine 
the  effectiveness  of  operational  testing.  Finally,  some  possible  benefits  of 
IOT&E  evaluation  were  listed. 

Chapter  Two  will  briefly  review  some  of  the  history  of  operational 
testing,  a  record  fraught  with  reorganization  and  turbulent  change.  This 
restless  search  for  effective  operational  testing  further  supports  the  need  for 
an  objective  evaluation  method. 
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CHAPTER  TVO 


HISTORICAL  OTAE:  A  RESTLESS  SEARCH 

We  regard  the  creation  of  the  testing  and  evaluation  group  as  of  the 
utmost  inportance,  since  we  believe  most  of  our  previous  failures  to 
be  prepared  for  wars.  .  .  would  have  been  thoroughly  exposed  had  an 
adequate  program  of  testing  and  evaluation  existed. (13:26) 

-  President’s  Scientific 
Advisory  Committee,  1970 

Keep  on  going  and  chances  are  you  will  stumble  on  something. (2:  177) 

-  Charles  F.  Kettering 

The  history  of  operational  test  and  evaluation  is  a  turbulent  chronicle 
filled  with  disputes  over  various  issues.  Judging  by  the  number  of  changes, 
the  issue  causing  the  most  disagreement  was  how  to  organize  for  operational 
testing.  Who  was  to  do  it,  and  who  should  supervise  it?  Tracing  the  organiza¬ 
tional  development  of  operational  testing  leads  through  a  bewildering  maze  of 
command  and  staff  structures.  This  chapter  concentrates  on  the  pattern  of 
organizational  change  in  QTAB  history.  The  pattern  is  significant  since  a 
high  frequency  of  change  is  expected  when  a  poorly  operating  system  lacks 
appropriate  feedback.  In  the  case  of  OTAE,  managers  knew  they  had  to  change 
something;  they  Just  didn't  know  what.  Inappropriate  changes  led  to  unforeseen 
problems  eventually  requiring  still  more  changes.  Although  difficult  to  prove, 
the  lack  of  appropriate  feedback  may  be  partially  responsible  for  30  years  of 
organizational  flux.  This  unfortunate  pattern  began  when  operational  testing 
started  to  split  away  from  traditional  testing  in  the  1930s. 

One  of  the  first  organizations  expressing  an  interest  in  separate  opera¬ 
tional  or  tactical  testing  was  the  Air  Corps  Tactical  School  (ACTS).  In  the 
1930s,  the  school  was  one  of  the  lead  agencies  developing  tie  emerging  air 
power  doctrine  proposed  by  Douhet,  Trenchard,  and  Mitchell. (20: 45)  Interested 
in  how  new  airplanes  could  be  used  to  tactically  execute  the  doctrine,  the 
school  naturally  wanted  to  begin  testing.  However,  the  ACTS  desire  to  test 
sparked  an  Immediate  controversy  with  the  traditional  test  agency  at  Wright 
Field. (13: 10)  In  1934,  a  study  group  appointed  by  the  Secretary  of  War,  the 
Baker  Board,  recommended  that  an  independent  test  unit  be  set  up  at  the  ACTS. 
(13:9)  lo  action  was  taken  and  the  controversy  continued  until  1939,  when  the 
Air  Corps  created  a  dedicated  test  unit,  the  23rd  Composite  Group,  under  the 
Air  Corps  Board. (13: 10)  With  this  action,  the  Air  Corps  separated  operational 
testing  from  developmental  testing  done  at  Wright  Field— the  first  shot  of  an 
organizational  war  lasting  30  years. 


Operational  testing  was  off  and  running  on  its  own,  but  not  without  growing 
pains,  In  1940,  to  sake  room  for  pilot  training,  the  Air  Corps  transferred  the 
23rd  to  Orlando. (13:11)  "Moving  the  23d  (sic)  to  Orlande  created  an  unsatis¬ 
factory  situation — the  23d  still  did  the  majority  of  its  testing  at  Eglin 
Field,  but  remote  from  its  headquarters  at  Orlando  and  from  the  Air  Corps  Board 
at  Maxwell. " (13: 11)  In  1941,  the  23rd  Composite  Group  left  the  Air  Corps  Board 
and  became  part  of  the  new  Air  Corps  Proving  Ground  at  Eglin  Field,  a  group 
charged  with  tactical  testing. (13:12-13)  Complexity  grew  as  new  organizations 
were  added  in  1942.  In  that  year,  as  part  of  a  massive  reorganization  of  the 
new  Army  Air  Forces,  the  Pentagon's  Directorate  of  Military  Requirements  was 
created  to  facilitate  the  Incorporation  of  "combat  lessons"  in  the  new  air¬ 
craft.  (13:  13)  But  the  reorganizers  also  saw  a  need  for  still  another  testing 
group.  They  created  the  Army  Air  Force  School  of  Applied  Tactics  at  Orlando, 
Florida,  to  teach  combat-proven  tactics  to  new  aviators  and  "test  the  tactical 
suitability"  of  aircraft  already  tested  at  the  Proving  Ground. (13: 14)  The 
Orlando  school  was  the  third  agency  charged  with  some  type  of  testing,  and  the 
second  performing  operational  testing. 

For  a  nation  at  war,  three  independent  testing  agencies  proved  overly 
cumbersome.  Finally,  in  a  1943  consolidation,  both  the  School  of  Applied 
Tactics  and  the  Proving  Ground  were  reassigned  to  the  Army  Air  Force  Board, 
reporting  to  the  Directorate  of  Operations,  Commitments,  and  Requirements  in 
Washington,  DC. (13: 15)  However,  problems  continued  until  1945,  when  it  seemed 
"the  system  continued  to  work  only  because  of  the  cooperation  of  the  various 
commanders  involved."  (13:17)  In  1946,  responsibility  for  all  operational 
suitablli-y  and  tactical  employment  testing  was  transferred  to  the  Army  Air 
Force  Proving  Ground  Command.  (13: 18-19)  But  somehow,  the  Air  University 
inherited  the  test  oversight  responsibilities  of  the  defunct  AAF  Board. 

"Besides  their  academic  training  and  research  responsibilities,  Air  University 
was  responsible  to  'plan  and  supervise  the  development  and  testing  of  new  and 
improved  methods  and  teenniques  of  aerial  warfare;  and  to  approve,  activate, 
and  designate  test  agencies  and  monitor  all  projects  involving  tactical  unit 
testing. (13: 19)  Unfortunately,  the  Air  University  had  no  association  with 
the  Proving  Ground  Command  or  its  resources. 

When  General  Fairchild  began  to  gather  the  resources  needed  to  fulfill  Air 
University’s  testing  role,  General  Quesada,  Commander  of  the  Tactical  Air 
Command,  violently  objected. (13: 19)  He  believed  that  operational  testing 
belonged  with  the  commands,  not  with  the  academics  of  Air  University.  General 
Spaatz  agreed  and  barred  Air  University  from  the  testing  business. (13:19)  In 
1947,  as  the  Army  Air  Forces  became  the  US  Air  Force,  developmental  testing 
belonged  to  the  Materiel  Command,  operational  suitability  and  tactical  develop¬ 
ment  tasting  belonged  to  the  Air  Proving  Ground  Command,  and  the  operational 
commands  performed  operational  effectiveness  testing.  Unfortunately,  problems 
continued  since  "the  Air  Proving  Ground  Command,  operating  in  conjunction  with, 
but  separate  from,  the  Air  Materiel  Command  and  the  operational  commands,  could 
not  satisfy  all  observers  in  its  role,  nor  could  it  represent  the  operational 
commands  properly.  Rapid  technological  advancement  and  increasing  costs 
provoked  misgivings  about  how  research  and  development  was  conducted. " (13: 20) 
Even  as  a  separate  service,  the  Air  Force  was  unable  to  end  spasmodic  organiza¬ 
tional  change  in  the  operational  testing  business. 


Between  1947  and  1970,  there  were  several  majc*  changes  in  the  organization 
of  OTAE.  In  1957,  the  Air  Proving  Ground  Command  was  shorn  of  its  independence 
and  absorbed  into  the  Research  and  Development  Command. (13:23)  Problems 
proliferated,  and  in  1964,  a  special  Air  Staff  office  was  created  to  monitor 
OTAE. (13:25)  But  by  this  time,  the  operational  commands  responsible  for 
effectiveness  testing  were  otherwise  occupied  with  the  growing  war  in  Southeast 
Asia. 

Vietnam  stressed  the  inefficient,  confusing  operational  testing  system  to 
the  breaking  point.  To  fight  the  war,  the  Air  Force  needed  new  systenc  on  the 
ramp  as  soon  as  possible  and  operational  testing  took  time.  Total  Package 
Procurement  became  a  popular  acquisition  technique  and  committed  the  Air  Force 
to  production  of  new  weapons  without  sufficient  OTAE.  "Costs  soared,  systems 
suffered  long  delays,  and  many  systems  experienced  reliability  and  maintenance 
problems  after  deployment.  " <13:  25)  In  1970,  the  President's  Scientific 
Advisory  Board  gave  the  Air  Force  failing  narks  for  acquisition:  "It  became 
clear  that  system  failures,  high  acquisition  costs,  and  extensive  post-produc¬ 
tion  system  modifications  could  be  attributed  to  inadequate  OTAE  and,  in  some 
cases,  to  the  complete  lack  of  OTAE  prior  to  production." (13:26)  As  a  solu¬ 
tion,  the  Blue  Ribbon  Defense  Panel  recommended  the  creation  of  a  testing 
office  at  the  Secretary  of  Defense  level.  <13: 26)  Reorganization  was  still  the 
preferred  fix  for  testing  problems. 

From  1939  to  1970,  major  reorganizations  scrambled  operational  testing 
units  every  few  years.-  However,  the  result  was  not  a  highly  efficient, 

Involved  operational  testing  organization  responsive  to  field  requirements. 
Instead,  after  30  years  of  alternative  wiring  diagrams- and  command  structures, 
a  presidential  panel  had  pronounced  the  acquisition  system  a  failure  due  to 
Inadequate  OTAE.  Apparently  the  changes,  although  frequent,  didn’t  work. 
Today’s  managers  should  be  concerned  about  this  historical  pattern  of  change 
for  a  couple  of  reasons. 

First,  many  of  the  changes  in  operational  testing  were  made  after  problems 
showed  up  in  wartime.  lew  weapons  either  weren't  incorporating  the  lessons 
learned  in  combat,  or  testing  was  taking  too  long  and  not  providing  the  effec¬ 
tive,  suitable  aircraft  needed  to  do  the  job.  Significantly,  today’s  systems 
are  untested  in  combat.  16  IGTAE  doing  a  good  Job,  or  is  it  unintentionally 
masking  deadly  deficiencies?  A  war  would  provide  answers,  but  leaves  a  lot  to 
be  desired  as  a  feedback  tool. 

Secondly,  history  shows  there  is  no  easy  fix  for  complex  OTAE  problems. 
Obviously,  when  a  system  must  be  changed  again  and  again,  the  changes  aren’t 
working.  Judging  from  the  great  number  of  changes  made,  improving  OTAE  is  no 
trivial  task.  For  one  thing,  large  changes  in  any  complex  structure  are  likely 
to  lead  to  unforeseen  consequences.  This  is  particularly  true  if  the  manager 
has  difficulty  pinning  down  the  exact  cause  of  the  problem.  The  fact  that 
eight  different  investigative  boards  worked  on  weapons  testing  in  the  1970s  is 
testimony  to  the  difficulty  of  the  problem.  <11: 2)  One  could  pessimistically 
conclude  from  history  that  changes  in  OTAE  organization  will  continue  forever, 
each  new  change  resulting  in  undesirable  outcomes. 


In  summary,  the  first  35  years  of  OTAE  history  are  characterized  by 
recurring  organizational  change.  Struggling  to  improve  the  value  of  operation¬ 
al  testing,  managers  tried  various  organizational  schemes.  In  some  years,  OTAE 
was  subordinate  to  developmental  testing;  at  other  times,  OTAE  was  done  by 
operational  commands  outside  the  acquisition  system.  It  became  obvious  in 
1970  that  all  the  changing  had  not  improved  Air  Force  OTAE.  In  fact,  the 
Vietnam  war  exposed  several  examples  of  complete  OTAE  failure.  All  the  years 
of  changing  had  led  only  to  more  problems — problems  aggravated  by  combat. 

Today,  testers  can't  be  dependent  on  combat  to  evaluate  IOTAE.  Complex 
testing  issues  demand  high-resolution  feedback  that  shows  the  exact  nature  of 
each  problem.  Only  by  fixing  the  specific  problems,  can  testers  avoid  changes 
that  bring  unforeseen  consequences.  It's  high  time  a  method  was  developed  that 
could  provide  such  feedback  before  the  next  war  starts.  Taking  the  first  step 
toward  that  feedback  technique,  Chapter  Three  defines  OTAE’s  present-day 
mission  and  the  challenges  to  that  mission. 
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CHAPTER  THREE 


OT&E  MISSION  AND  CHALLENGES 

.  .  .  Fron  1980  through  1984,  DoD  Itself  slowed  down  or  stayed  the 
production  of  twenty-six  weapon  systems  upon  discovering  deficien¬ 
cies  during  operational  testing.  <5:41) 

-  AIR  FORC.'  Magazine 

Previous  chapters  discussed  the  need  for  objective  feedback  on  IOT&E 
effectiveness.  This  chapter  is  for  readers  unfamiliar  with  operational 
testing.  Naturally,  measuring  the  effectiveness  of  any  process  requires 
complete  familiarity  with  the  process  and  its  goals.  Accordingly,  this  chapter 
lays  the  foundation  of  the  Operational  Testing  Effectiveness  Evaluation  Method 
(OTEEM)  by  reviewing  the  OTAE  mission.  The  prospective  evaluator  must  also 
know  what  challenger.  OT&E  is  likely  to  face  along  the  way.  The  OTEEM  should 
measure  IOT&E  mission  accomplishment  with  particular  attention  to  the  possible 
deficiencies  caused  by  these  challenges.  The  first  step  is  to  review  the 
mission  of  OT&E. 

The  mission  statement  from  Chapter  One  applies  to  both  developmental  and 
operational  testing:  "Their  primary  purposes  are:  to  identify,  assess,  and 
reduce  the  acquisition  risks;  to  evaluate  operational  effectiveness  and 
operational  suitability;  to  identify  any  deficiencies  in  the  system;  and  to 
ensure  that  only  operationally  effective  and  suitable,  supportable  systems  are 
delivered  to  the  operating  forces. " <17: 2)  Air  Force  Regulation  80-14  defines 
some  of  these  terms. 

Acquisition  Sisk.  The  chance  that  some  element  of  an  acquisition 
program  produces  an  unintended  result  with  adverse  effect  on  system 
effectiveness,  suitability,  cost,  or  availability  for  deployment. 

Operational  Effectiveness.  The  overall  degree  of  mission  accomplish¬ 
ment  of  a  system  used  by  representative  personnel  in  the  context  of  the 
organization,  doctrine,  tactics,  threats  including  countermeasures  and 
nuclear  threats),  and  environment  in  the  planned  operational  employment 
of  the  system. 

Operational  Suitability.  The  degree  to  which  a  system  can  be  satisfac¬ 
torily  placed  in  field  use,  with  consideration  being  given  to  availa¬ 
bility,  compatibility,  transportability,  interoperability,  reliability, 
wartime  usage  rates,  maintainability,  safety,  human  factors,  manpower 
aupportability,  logistic  supportability,  and  training  requirements. 
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Maintainability.  A  measure  of  the  time  or  maintenance  resource  needed 
to  keep  an  item  operating  or  restore  It  to  operational.  .  .  status. 
Maintainability  may  be  expressed  as  the  time  to  do  maintenance.  .  .  as 
a  usage  rate  of  manpower  resources.  .  .  lb  the  total  required  man¬ 
power.  .  .  or  as  the  time  to  restore  a  system  to  operational 
status.  .  .  . 

Reliability.  The  probability  that  an  item  will  perform  a  required 
function  under  specified  conditions  for  a  specified  period  of  time  or 
at  a  given  point  in  time.  Also  expressed  as  the  average  time  an  item 
will  perform  a  specified  function  without  failure. 

Crlticrl  Issue.  Those  aspects  of  a  system's  capability,  either 
operational,  technical,  or  other,  that  must  be  answered  before  a 
system’s  overall  worth  can  be  estimated,  and  that  are  of  primary 
importance  to  the  decision  authority  in  deciding  to  allow  the  system 
to  advance  into  the  next  acquisition  phase. (17:34-37) 

With  these  definitions  in  mind,  the  specific  function  of  GT&E  is:  "to 
ensure  that  only  operationally  effective  and  suitable  systems  are  delivered  to 
the  operating  farces. " <17: 7)  It  does  this  by  "identifying,  assessing,  and 
reducing"  the  possibility  that  something  unexpected  will  have  a  negative  effect 
on  some  characteristic  of  the  system.  Contrast  OTAE’s  concern  for  the  operat¬ 
ing  forces  with  the  purpose  of  Development  Test  and  Evaluation  (DT&E):  "That 
testing  and  evaluation  used  to  measure  progress,  verify  accomplishment  of 
developmental  objectives,  and  to  determine  if  theories,  techniques,  and 
materiel  are  practicable;  and  if  systems  or  items  under  development  are 
technically  sound,  reliable,  safe,  and  satisfy  specif ications. ” <17: 34)  DTAE 
emphasizes  feasibility  and  specification  compliance,  while  OTAE  is  concerned 
with  predicting,  verifying,  and  improving  the  capabilities  and  characteristics 
of  an  operational  weapon.  OTAE  has  the  following  specific  objectives: 

a.  Evaluate  the  operational  effectiveness  and  operational  suitability 
of  the  system. 

b.  Answer  unresolved  critical  operational  issues. 

c.  Identify  and  report  operational  deficiencies. 

d.  Recommend  and  evaluate  changes  in  system  configuration. 

e.  Provide  information  for  developing  and  refining: 

(1)  Logistics  and  software  support  requirements  for  the  system. 

(2)  Training,  tactics,  techniques,  and  doctrine  throughout  the  life 
of  the  system. 

f.  Provide  information  to  refine  operation  and  support  (OAS)  cost 
estimates  and  identify  system  characteristics  or  deficiencies  that  can 
significantly  affect  OAS  costs. 

g.  Determine  if  the  technical  publications  and  support  equipment  are 
adequate. 

h.  Assess  the  survivability  of  the  system  in  the  operational  environ¬ 
ment.  (17: 7) 

There  are  three  types  of  operational  testing  used  to  achieve  the  above  objec¬ 
tives:  Qualification  Operational  Test  and  Evaluation  (QOTAE) ,  Initial  Opera- 
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tional  Test  and  Evaluation  <IOT&E),  and  Follow-on  Operational  Test  and  Evalua¬ 
tion  <FOT&E) . 


All  three  types  have  important  functions  in  the  operational  test  mission. 
QOT&E  is  primarily  concerned  with  modifications  to  existing  equipment,  or 
introduction  of  off-the-shelf  equipment  that  requires  no  special  research  and 
development.  <17: 3)  This  report  concentrates  on  IOT&E  and  FOT&E.  These  two 
kinds  of  operational  testing  are  actually  different  test  phases  conducted  on 
the  same  weapon  system.  IOT&E  is  performed  prior  to  production  for  major  new 
systems  requiring  research  and  development.  FOT&E  is  further  operational 
testing  performed  after  the  production  decision  is  made — in  fact,  throughout 
the  lifetime  of  the  fielded  weapon  system.  <17: 3>  While  IOT&E  and  FOT&E  examine 
many  of  the  same  characteristics  and  share  some  objectives,  they  have  different 
purposes. 


IOT&E's  purpose  is  reflected  in  the  OT&F  mission  statement.  Making  sure 
that  only  effective  and  suitable  weapons  get  to  the  ramp  is  a  two-step  process. 
First,  the  operational  tester  must  be  able  to  distinguish  the  weapons  that 
aren't  effective  and  suitable;  and  second,  report  before  a  production  decision 
is  made.  The  use  of  operational  testing  to  support  decision  milestones  was 
introduced  in  the  1970s.  <11:9)  Today,  the  primary  purpose  of  IOT&E  is  to 
provide  information  for  dec'nion  makers  on  operational  effectiveness  and 
suitability  at  each  decision  milestone  in  the  acquisition  process.  <17: 3) 
Operational  testers  are  increasingly  Involved  in  earlier  stages  cf  development, 
providing  data  on  the  operational  value  of  proposed  weapons,  as  well  as  an 
operational  perspective  in  the  development  process.  <12:9) 


Like  IOT&E,  F0T&E  s  primary  purpose  is.  determined  by  its  timing  in  the 
acquisition  process.  In  a  classic  acquisition  program,  FOT&E  starts  after  the 
production  decision  i«*  made.  Therefore,  its  goal  is  no  longer  oriented  toward 
decision  making.  Instead,  FOT&E  strives  to  Improve  the  weapons  system  or  the 
way  it' 8  used.  In  the  words  of  AFR  80-14:  "It  is  used  to  refine  estimates 
made  during  IOT&E,  to  evaluate  changes  made  to  correct  deficiencies  found  in 
prior  T&E,  and  to  identify  additional  def iciencies. " <17: 4)  Also  it  helps 
.  .  to  find  out  whether  the  system  can  meet  changing  operational  require¬ 
ments;  to  develop  or  refine  employment  tactics;  to  determine  the  system’s 
operational  effectiveness  and  suitability  characteristics.  .  .  and  to  refine 
doctrine  and  training  programs. " <17: 4)  FOT&E  refines  pre-production  IOT&E 
estimates  so  users  can  more  efficiently  employ  the  weapon.  FOT&E  is  necessary 
because  of  several  challenges  that  cause  uncertainty  in  IOT&E  results. 


The  operational  tester  faces  numerous  challenges.  These  include  excessive 
emphasis  on  cost  and  schedule,  lack  of  realism  in  testing,  politics  pr  a  lack 
of  Independence,  and  the  changing  threat.  These  challenges  may  cause  IOT&E 
estimates  to  fall  wide  of  the  mark.  A  brief  discussion  of  each  challenge  and 
how  it  might  affect  a  *"»apcn  system  should  prove  useful  in  designing  a  measure¬ 
ment  system  to  Judge  test  effectiveness.  Basically,  OTEEX  will  measure  how 
much  the  aggregate  of  these  challenges  affects  a  particular  test  program.  The 
first  of  the  challenges,  excessive  emphasis  on  cost  and  schedule,  can  cause  a 
number  of  problems. 
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Operational  testing  of  nonproduct ion-representative  equipment  or  lack  of 
sufficient  operational  testing  are  indicative  of  excessive  emphasis  on  cost  and 
schedule.  This  undesirable  situation  is  sometimes  unavoidable  if  the  immediate 
need  for  the  system  is  overwhelming.  As  Mr.  Jack  Krings,  DOTE,  says:  "In  some 
cases,  the  operational  effectiveness  may  be  secondary.  Sometimes  you  have  to 
buy  a  scarecrow — it  won't  kill  many  birds,  but  it'll  keep  a  lot  of  them 
away. " <4: 52)  He  went  on  to  say:  "It's  vital  to  get  something  out  as  a 
deterrent,  and  maybe  you  can  fix  it  after  it's  out  there.  .  .  .  That  doesn’t 
sound  like  very  good  policy  in  terms  of  being  very  firm  about  operational 
requirements,  but  sometimes  it’s  Just  a  more  practical  way. "(4: 53)  Vhen 
there’s  a  schedule  crunch,  IOT&E  test  directors  may  be  asked  to  test  hand-built 
FSD  hardware  rather  than  wait  for  production-representative  systems.  Unfor¬ 
tunately,  test  teams  that  use  such  shortcuts  may  misjudge  critical  charac¬ 
teristics  like  reliability  and  maintainability.  If  unpleasant  surprises  in  any 
of  the  system  characteristics  are  traced  to  production  line  changes,  then  it’s 
possible  production-representative  systems  were  never  tested.  A  different  but 
closely  related  consequence  of  cost-and-schedule  mania  is  insufficient  build-up 
testing. 


I  OT&B 

FOT&E 

MAIN  PURPOSE 

Decision  making 

Improve  system/Use  of 

system 

SECONDARY 

Improve  system/ 

Refine  estimates  of 

-  PURPOSE 

Estimate  use  data 

IOT&E 

PRIMARY 

Learn  effectiveness 

Recommend/evaluate 

OBJECTIVES 

and  suitability 

changes 

Answer  critical  issues 

Identify/report  de¬ 
ficiencies 

Identify/report  de¬ 
ficiencies 

Refine  operating  info 
for  logistics,  tactics, 

Assess  survivability 

training,  etc. 

SECONDARY 

Obtain  operating  info 

Tech  ordi  --'support 

OBJECTIVES 

Assist  tech  order/sup- 

equipment,  eval 

port  equipment  develop- 

Refine  estimates  of 

ment 

effectiveness,  suit¬ 
ability,  survivability 

Recommend/e val.  changes 

Table  1.  OT&B  Purposes  and  Objectives 


Insufficient  operational  or  developmental  testing  can  have  disastrous 
results.  As  a  weapon  system  approaches  the  end  of  IOT&E  or  begins  FOT&E, 
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testers  often  run  comprehensive  operational  tests  as  a  kind  of  "final  exam. " 
Since  many  new  and  uncontrolled  factors  like  inexperienced  crews  or  maintenance 
are  often  present  during  FOTAE  or  late  IOTAE,  it’s  more  difficult  to  trace  the 
exact  cause  of  a  malfunction.  Moreover,  insufficient  build-up  operational  or 
developmental  testing  can  cause  painful  questions  about  system  reliability 
after  an  unexpected  system  failure.  Late  tn  FSD,  decision  makers  expect  the 
system  to  be  fairly  refined,  and  news  of  the  failure,  coupled  with  the  uncer¬ 
tainty  of  its  cause,  can  lead  to  further  program  delays  and  cutbacks.  There¬ 
fore,  major  problems  suddenly  occurring  in  the  later  stages  of  IOTAE  or  early 
FOTAE  may  be  related  to  insufficient  testing  caused  by  too  much  emphasis  on 
cost  and  schedule. 

Like  insufficient  tasting,  lack  of  realism  may  also  lead  to  alarming 
revelations  when  the  chips  are  down.  Some  realism  will  always  be  lacking  in 
operational  testing.  For  example,  it's  inappropriate  to  fire  live  surface-to- 
air  missiles  at  a  B-1B  Just  to  test  its  countermeasures.  Until  the  system  is 
used  in  an  operational  environment,  undetected  problems  may  lurk  in  the  design. 
Unfortunately,  the  test  ranges  and  techniques  used  in  IOTAE  may  be  used  again 
to  test  the  system  in  FOTAE,  never  revealing  t^ese  hidden  problems.  If  system 
failures  show  up  after  initial  use  in  the  field,  suspect  a  lack  of  realism  in 
IOTAE.  The  next  challenge,  politics  or  the  lack  of  independence,  Is  popular 
with  the  press. 

At  least  one  researcher  seeo  the  history  of  OTAE  as  a  search  for  independ¬ 
ence.  <13: — '  The  three  systems  briefly  discussed  in  Chapter  One  are  examples 
of  alleged  lack  of  independence.  If  a  test  program  really  did  suffer  from  this 
malady,  test  reports  might  not  include  much  negative  information.  Statements 
like  "insufficient  data  exists  but  simulations  of  projected  system  capabilities 
indicate"  signal  problems  with  independence.  However,  since  equipment  problems 
don’t  have  politics,  hidden  malfunction'-  will  inevitably  show  up  when  the 
system  reaches  the  field.  The  final  challenge,  the  changing  threat,  exists 
because  the  acquisition  process  takes  time. 


CHALLENGE _ EFFECT 


Cost  and  Schedule 

-  nonproduction  equipment 

-production  related  defects 

-  insufficient  testing 

-unsuspected  major  failure  in 

late  IOTAE  or  early  FOTAE 

Lack  of  Realism 

-failure  in  initial  field  use 

Politics/Lack  of  independence 

-numerous  unpredicted  major 

failures 

Changing  Threat 

-obsolescence  when  reaching 

field 

Table  2.  OTAE  Challenges  and  How  They  Affect  Weapon  Performance 
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lew  weapons  are  designed  to  counter  the  projected  threat.  However,  as 
full-scale  development  continues  over  a  period  of  years,  the  threat  may  change. 
To  some  degree,  DIVAD  was  a  victim  of  this  process. (12:14)  Unfortunately, 
testers  and/or  the  contractors  may  not  be  aware  of  the  new  threat  developments. 
Obsolescence  also  results  when  a  system  takes  longer  in  development  than 
anticipated.  Kewly  fielded  systems  checkmated  by  enemy  threat  development  are 
casualties  of  this  challenge.  This  chort  list  of  problems,  summarized  in  Table 
2,  is  by  no  means  a  complete  list  of  the  challenges  facing  opeiational  testers, 
but  gives  some  idea  of  how  difficult  OTAE  can  be.  With  all  the  potential 
challenges  out  there,  it  makes  sense  to  find  out  how  much  they  really  affect 
testing. 

This  chapter  discussed  the  tasks  that  today’s  OTAE  must  accomplish.  IOTAE 
and  FOTAE  have  similar  objectives,  but  IOTAE’ s  emphasis  is  on  Information  for 
decision  makers.  FOTAE* s  emphasis  is  on  improving  a  weapon  or  its  employment. 
The  different  objectives  ware  divided  into  primary  and  secondary  categories  in 
Table  1.  Several  challenges  were  discussed,  including  over-emphasis  on  cost 
and  schedule,  lack  of  realism  and  independence,  and  the  changing  threat.  The 
purpose/objectives  summary  in  Table  1  and  a  knowledge  of  the  different  chal¬ 
lenges  summarized  in  Table  2  provide  the  basis  for  the  evaluation  method 
developed  for  IOTAE  in  Chapter  Foui". 


CHAPTER  FOUR 


THE  OPERATIONAL  TESTING  EFFECTI VENESS  EVALUATION  METHOD: 

OTEEM  AND  IOT&E 

It  is  error  only,  and  not  truth,  that  shrinks  fron  inquiry.  <18: 15) 

-  Thomas  Paine 

Previous  chapters  addressed  the  testing  controversy,  the  related  need  for 
objective  feedback,  the  historical  flux  in  testing  organization,  and  IOT&E' s 
current  role  in  weapons  system  acquisition.  This  chapter  introduces  OTEEM,  a 
method  for  obtaining  JOT&E  feedback.  First,  the  desirable  features  of  OTEEM 
are  discussed.  Next,  OTEEM  methodology  is  explained  and  used  to  examine  tie 
IOT&E  program  of  an  actual  weapons  system.  This  example  is  illustrative  only. 
The  detailed  evaluation  of  a  test  program  using  a  refined  OTEEM  is  beyond  the 
scope  of  this  report. 


DESIRABLE  FEATURES 

There  are  four  features  or  characteristics  that  OTEEM  should  possess.  The 
first  of  these  is  goal  orientation.  The  IOT&E  primary  goals  covered  in  the 
last  chapter  were:  <1)  learn  effectiveness  and  suitability,  (2)  answer 
critical  issues,  (3)  identify  and  report  deficiencies,  and  (4)  assess  sur¬ 
vivability.  Since  the  IOT&E  secondary  goals  are  more  participatory  in  nature 
and  have  less  Impact  on  acquisition  decisions,  OTEEM  concentrates  exclusively 
on  the  IOT&E  primary  goals. 

OTEEM  must  also  be  sensitive  to  the  damaging  effects  of  the  OT&E  chal¬ 
lenges.  Recall  that  the  four  main  challenges  were:  (1)  overemphasis  on  cost 
and  schedule,  <2)  lack  of  realism  in  testing,  (3)  politics/lack  of  independ¬ 
ence,  and  (4)  the  changing  threat.  Since  the  effects  of  these  challenges  are 
usually  apparent  when  a  weapon  becomes  operational,  OTEEM  should  consider 
inforration  gathered  from  the  field.  However,  such  information  gathering  must 
be  practical  and  cost  effective,  the  next  characteristic. 

The  data  necessary  to  support  OTEEM  must  be  readily  available  and  inexpen¬ 
sive  to  obtain.  Overburdened  operating  commands  won’t  spend  a  lot  of  effort  on 
a  project  that  doesn't  yield  immediate  operational  benefits.  Moreover,  the 
method  must  be  Inexpensive  in  light  of  increasing  budget  cuts.  Therefore, 

OTEEM  should  use  only  Information  that's  already  available  and  cheap  to 
assemble. 
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Using  this  cheap,  readily  available  data,  the  OTEEM  should  provide  a 
universally  applicable  summary  of  test  program  effectiveness.  OTEEM  should  be 
capable  of  evaluating  the  entire  spectrum  rf  acquisition  programs,  from  gas 
masks  to  strategic  bombers  In  this  way,  OTEEM  will  facilitate  test  program 
comparison  and  reveal  broad  trends  and  relationships.  The  "big  picture"  made 
passible  by  comparison  of  OTEEM  results  should  help  managers  determine  the 
overall  health  of  OT&E.  Universal  applicability  is  the  last  of  the  four 
desirable  OTEEM  features,  including  goal  orientation,  cheap  and  available  data, 
and  challenge  sensitivity. 


METHODOLOGY 


To  satisfy  the  above  requirements,  OTEEM  compares  "snapshots"  taken  at 
different  times  in  the  weapons  system  life-cycle.  The  IOT&E  final  report,  a 
summary  of  IOT&E  predictions  and  assessments,  provides  the  first  of  these 
snapshots.  The  second  snapshot  is  the  field  experience  with  the  production 
weapon  system  summarized  in  the  FOT&E  Phase  One  final  report.  Since  FOT&E  and 
IOT&E  already  examine  many  of  the  same  parameters,  comparison  of  the  final 
reports  should  be  easy.  Table  3  is  a  summary  cf  how  OTEEM’ s  report  comparison 
method  meets  the  desired  characteristics. 


CHARACTERISTIC 

OT&E  FINAL  REPORT  COMPARISON 

Goal  Orientation: 

Final  reports  emphasise  the  four  primary 
objectives  of  IOT&E.  OTEEM  will  compare 
goal-related  dimensions. 

Challenge  Sensitivity: 

The  problems  show  up  in  FOT&E.  OTEEM  uses 
FOT&E  data. 

Inexpensive  Available  Data: 

FOT&E  already  gathers  the  exact  data 
required.  Both  OT&E  reports  address  same 

areas. 

Universal  Applicability: 

Four  primary  IOT&E  cojoccives  general 
enough  to  apply  to  almost  any  system. 

Table  3.  How  Report  Comparison  Satisfies  Desired  Characteristics 


OTEEM  evaluates  IOT&E  by  comparing  IOT&E  and  FOT&E  assessments  in  five 
dimensions  related  to  the  IOT&E  primary  goals:  effectiveness,  suitability, 
critical  issues,  deficiency  reporting,  and  survivability.  OTEEM  uses  specific 
procedures  for  each  of  these  dimensions. 

The  OTEEM  effectiveness  dimension  measures  the  accuracy  of  the  IOT&E 
weapons  system  effectiveness  assessment.  Weapons  system  effectiveness  is  a 
composite  measure  of  mission  accomplishment.  The  discrete  elements  contribut¬ 
ing  to  mission  accomplishment  are  different  for  each  weapons  system.  For 
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example,  the  effectiveness  elements  for  a  cruise  missile  may  oe  accuracy, 
range,  terrain-ic  ' lowing  (TF)  capability,  and  time  of  arrival  (TOA).  A  fighter 
aircraft  might  have  slightly  different  elements:  weapon  delivery  accuracy, 
combat  radius,  sustained-"g"  turn  capability,  and  speed.  IOTAE  and  FOT&H  final 
reports  assess  each  effectiveness  element.  In  the  ideal  test  program,  IOT&E 
assessments  should  agree  with  results  detei  mined  in  FOT&E.  The  next  step  is  to 
quantify  the  agreement  or  disagreement.  For  the  purpose  of  this  report,  the 
adjective  ratings  for  each  element  are  compared.  An  element  that  was  satisfac¬ 
tory  in  IOT&E,  but  deficient  in  FQT&E  is  scored  as  a  disagreement.  To  arrive 
at  the  accuracy  rating,  simply  compare  adjectives  for  each  element.  For 
example,  if  ten  effectiveness  elements  are  evaluated,  and  the  reports  disagree 
on  t*^o,  an  80  pere.-nt  OTEEX  accuracy  rating  is  achieved.  The  accuracy  rating, 
therefore,  expresses  the  overall  correctness  of  the  IQT&E  effectiveness 
assessment  based  on  FOT&E  results.  Sometimes  though,  IQTAE  fails  to  assess  an 
element  due  to  insufficient  testing,  mixed  results,  etc. 

An  additional  rating,  OTEEX  completion,  expresses  the  percentage  of 
elements  where  no  IOTAE  prediction  is  made.  For  example,  if  out  of  fifteen 
elements,  five  were  not  rated  and  five  disagreed,  OTEEN  completion  would  equal 
the  10  rated  elements  divided  by  the  15  possible  elements  or  67  percent.  OTEEX 
accuracy  would  then  equal  the  5  agreements  divided  by  the  10  rated  elements,  or 
50  percent.  The  completion  rating  really  expresses  the  degree  to  which  weapons 
system  effectiveness  is  known  after  I0TAE.  In  thi3  case,  the  status  of  only  6^ 
percent  of  the  elements  was  known  at  the.  end  of  IOT&E.  Together,  OTEEX 
accuracy  and  completion  make  up  the  I0T4E  effectiveness  assessment.  The  same 
approach  is  useful  in  the  next  dimension. 

Suitability  has  well-defined  elements  common  to  many  different  test 
programs.  Recall  from  Chapter  Three  that  these  elements  include  availability, 
compatibility,  transportability,  Interoperability,  reliability,  main¬ 
tainability,  safety,  human  factors,  and  logistics  supportability.  Many  of 
these  factors  can  be  further  broken  down  to  smaller  components..  For  example, 
logistics  supportability  Includes,  manpower,  technical  data,  training,  and 
wartime  usage  rates.  <17: 35)  Xany  suitability  elements  can  be  quantified  with 
measures  like  Xean  Time  Between  Critical  Failure  (XTBCF)  or  Kean  Time  To  Repair 
<KTTR).  Suitability  is  scored  in  OTEEX  accuracy  and  OTEEX  completion  using  the 
same  techniques  as  OTEEX  effectiveness.  The  next  dimension  uses  a  similar 
approach. 

IOTAE  and  F0T4E  final  reports  list  critical  issues.  OTEEX  approaches  the 
critical  issue  dimension  two  ways.  First,  what  percentage  of  the  issues  were 
answered  in  IOTAE;  and  second,  were  the  arswers  right?  Unfortunately,  after 
listing  the  Issues,  the  final  reports  may  never  explicitly  answer  them. 

Instead,  issue  answers  are  often  implied  in  the  report  text  or  summary. 
Therefore,  critical  issue  accuracy  can  only  be  Judged  by  inference.  The  OTEEX 
critical  issue  dimension  includes  percentages  answered  and  accurate.  So  far, 
effectiveness,  suitability,  and  critical  issue  dimensions  have  all  shared  a 
common  accuracy/completion  approach.  The  next  dimension,  deficiency  reporting, 
requires  a  different  comparison  technique. 

The  deficiency  reporting  dimension  can  be  seen  as  an  expression  of  weapon 
'.ystem  maturity  at  the  end  of  IOTAE.  The  more  mature  a  weapon  is  when 
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produced,  the  fewer  critical  deficiencies  show  up  in  FOT&E.  OT&E  final  reports 
divide  deficiencies  into  two  categories,  mission  critical  and  nonmission 
critical  (other).  Different  labels  are  used  in  different  programs,  but 
critical  deficiencies  impact  basic  effectiveness/suitability,  while  other 
deficiencies  have  a  lower  level  of  urgency.  OT&E  final  reports  document  the 
number  of  each  type  of  deficiency.  As  a  weapon  system  matures,  the  number  of 
new  deficiencies  should  decrease.  Guided  by  this  assumption,  OTEEM  expresses 
FOT&E  deficiencies  as  a  percentage  of  IOT&E  deficiencies.  OTEEM  has  a  Critical 
Deficiency  Reduction  Percentage  (CDRP)  and  an  Other  Deficiency  Reduction 
Percentage  (ODRP).  For  example,  if  IOT&E  reports  32  critical  deficiencies,  and 
FOT&E  reports  21,  CDRP  =  06  percent.  Unfortunately,  until  a  data  base  is 
gathered  from  many  test  programs,  it  will  be  hard  to  say  whether  a  particular 
percentage  is  good  or  bad.  The  final  dimension  is  mare  difficult  to  quantify. 

Survivability  could  be  expressed  several  ways.  Some  of  the  possible 
alternatives  include  system  performance  against  specific  threats,  or  proba¬ 
bility  of  penetration  when  opposed  by  a  range  of  different  threats.  Sur¬ 
vivability  estimation  techniques  and  results  are  often  highly  classified  and 
not  available  for  analysis.  Because  of  this,  exact  techniques/exaraples  are 
beyond  the  scope  of  this  report,  but  comparison  of  the  various  elements 
(specific  threat  systems,  or  aggregate  profiles)  between  IOT&E  and  FOT&E  should 
yield  survivability  dimension  accuracy  and  completion  [ercentages.  The  sur¬ 
vivability  dimension  is  the  last  component  of  the  OTEEM  assessment.  Next,  a 
sample  application  helps  illustrate  the  OTEEM  in  action. 


■  APPLICATION 

Any  example  must  be  general  enough  to  avoid  specific  classified  element 
values.  Again,  the  purpose  of  the  example  is  to  illustrate  the  technique  and 
stimulate  thought,  not  to  Judge  the  effectiveness  of  a  particular  program.  A 
fair  evaluation  using  OTEEM  would  require  much  more  in-depth  analysis  and 
probably  a  classified  format.  The  Air-Launched  Cruise  Missile  was  chosen  for 
the  OTEEM  application  exercise. 

The  Air-Launched  Cruise  Missile  (ALCM),  officially  designated  the  AGM-86B, 
is  a  strategic  weapon  system  procured  in  the  late  1970s  and  early  1980s.  IOT&E 
on  the  ALCM  extended  from  23  April  1979  to  31  March  1980. (14: i)  It  was  con¬ 
ducted  in  conjunction  with  a  fly-off  between  two  contractors  and  consisted  of 
10  launches  and  10  captive  carries  for  each  contractor.  (14: i)  Captive-carry 
missions  simulate  missile  flight  with  the  missile  connected  to  the  aircraft 
pylon.  The  winning  contractor,  Boeing  Aerospace  Company,  was  awarded  the 
contract,  and  FOT&E  was  conducted  between  April  1980  and  May  1981. (15: ii) 

During  FOT&E,  eleven  launches  were  conducted  with  an  unspecified  number  of 
captive  carries. (15: ii)  It’s  important  to  note  that  the  ALCM  is  not  a  perfect 
example  of  IOT&E  supporting  milestone  decisions.  A  critical  need  for  the 
system  forced  a  production  decision  before  IOT&E  was  complete. (5:45)  Fourteen 
areas  were  examined  in  both  IOT&E  and  FOT&E.  These  areas  were  separated  into 
the  OTEEM  dimensions  below.  Tables  4-7  list  the  raw  data  extracted  from  the 
reports  for  later  calculation  of  the  OTEEM  ratings.  Survivability  was  not 
included  due  to  classification.  In  these  tables,  "S”  =  satisfactory, 

MU"  =  undetermined,  "D"  =  deficient.  The  adjective  ratings  were  based  on  test 


team  assessments  in  the  final  reports.  In  a  few  cases,  the  adjective  rating 
was  evident,  but  not  clearly  stated  as  "satisfactory"  etc. 


ELEMENT 

IQT&E  RATING 

FOT&E  RATING 

Reliability 

D 

D 

Compatibility 

U 

U 

B-52  systems 

U 

U 

B-52  range/handling 

S 

S 

Interoperabi 1 ity 

U 

B 

Mission  Planning 

U 

U 

Data  transfer 

U 

U 

Throughput 

U 

U 

Output  Accuracy 

U 

U 

Ease  of  use 

U 

U 

Availability 

S 

D 

Logistics  reliability 

S 

D 

Xaintainabi lity 

S 

S 

Logistics  supportability 

s 

D 

RAM  Interface 

u 

D 

Maintenance  concept 
(base/depot) 

s/u 

S/S 

Support  Equipment 

s 

s 

Planned  supply,  support 

u 

u 

Transportation,  packaging, 
and  handling 

s 

s 

Technical  data 

D 

D 

Facilities 

S 

S 

Manpower 

S 

S 

Training 

s 

S 

Maintenance  training 

s 

s 

Training  suit. 

s 

s 

Human  Factors 

s 

s 

Software  suitability 

u 

u 

Software  maintainability 

u/s 

U/D 

Software  useablllty 

D 

U 

OVERALL  SUITABILITY 

NO  RAflNG(U) 

D 

Table  4. 


ALCM  Suitability  Dimension 
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ELEMENT 

IOT4E  RATING 

FOT&E  RATING 

Accuracy:  en  route/terminal 

S/U 

S/U 

Range 

S 

S 

terrain  Following  <TF) 

U 

D 

Launch  envelope 

U 

U 

Tine  of  Arrival  <T0A) 

S 

u 

Alternate  mission  capability 

S 

s 

Operational  Test  Launch  (OTL) 

payload 

D 

D 

Arm  and  fuze  warhead 

S 

S 

Captive  carry  missile  status 

S 

S 

OVERALL  EFFECTIVENESS 

D 

D 

Table  5.  ALCX 

Effectiveness  Dimension 

ANSWERED 

CORRECT 

ISSUE 

IN  IOTAB 

Iff  FOT&E 

a.  AGM-86B  v.  AGM-109,  which 

YES 

N/A 

is  most  cost-effective 

(assume  YES) 

answer  to  AF  need? 

b.  Tech,  performance/design 

XOT 

— 

parameters  demo'd  within 

ANSWERED 

appropriate  threshold  value? 

* 

c.  Compatible  with  SRAX  and 

NOT 

— 

gravity  weapons? 

ANSWERED 

d.  Does  Xission  Completion 

NOT 

— 

Siccess  Probability  (XCS?) 

ANSWERED 

m.tch  SAC  requirement? 

e.  Can  digital  terrain  data 

YES 

YES 

and  operational  navigation 

requirements  be  integrated 

in  effective  mission  profiles? 

Table  6.  ALCX  Critical  Issue  Dimension 

IOTAE 

FOTAE 

T  CRITICAL  DEFICIENCIES: 

22 

80 

Table  7.  ALCX  Deficiency  Reporting  Dimension 
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The  final  step  is  to  process  the  raw  data  using  the  techniques  explained 
above.  When  scoring  accuracy,  only  IOT&E  elements  listed  "S"  or  "D"  count 
(it’s  impossible  to  measure  the  accuracy  of  "U">.  IOT&E  "S"  or  "D”  elements 
that  decay  to  "0"  in  FOT&E  are  scored  as  disagreements.  IOT&E  "D"  elements 
that  improve  to  "S"  in  FQT&E  are  not  considered  disagreements — system  improve¬ 
ment  l <3  the  desired  consequence  of  IOT&E  deficient  ratings.  Using  the  rules 
and  techniques  -,bove  to  reach  scoring  percentages  for  each  of  the  dimensions 
yields  the  following  OTEEH  results  for  ALCK  IOT&B.  The  implications  of  the 
ALCM  OTEEH  results  are  addressed  in  Chapter  Five. 

BFFECTIVEHESS:  OTEEH  Accuracy  =  86% 

OTEEH  Completion  =  65% 

SUITABILITY:  OTEEH  Accuracy  -  79% 

OTEEH  Completion  =  53% 

CRITICAL  ISSUES:  Percent  Answered  =40% 

Percent  Correct  =100% 

DEFICIEHCIES:  CDRP  =  364% 

ODRP  =  371% 

SURVIVABILITY:  Rot  Included  due  to  classification 

This  chanter  began  with  a  discussion  of  desirable  OTEEH  characteristics 
including  goal  orientation,  sensitivity  to  challenges,  accessible  and  inexpen¬ 
sive  i  *ta,  and  universal  applicability.  The  OTEEH  report  comparison  method  has 
all  the  desirable  features.  lext,  the  specific  methodology  for  OTEEH  was 
Introduced  and  applied  to  the  ALCH.  Chapter  Five  discusses  various  findings 
highlighted  by  the  OTEEH  application  and  some  miscellaneous  observations  and 
.concerns. 
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CHAPTER  FIVE 


FIIDIHGS 

If  truth  is  beauty,  how  cone  no  one  has  her  hair  done  at  the 

library? (13: 231) 

-  Lily  Tomlin 

Chapter  Four  developed  OTEEK  and  applied  it  to  the  ALCK  IOT&E  assessment. 
The  application  exercise  was  a  trial  run  designed  to  uncover  OTEEM  problems  and 
suggest  refinements,  the  subject  of  this  chapter.  The  application  exercise 
raises  several  Important  Issues. 

A  glance  at  the  ALCK  effectiveness  and  suitability  data  (Tables  4  and  5) 
reveals  the  first  problem:  the  large  number  of  IOT&E  undetermined  or  ”U" 
elements.  A  possible  explanation  lies  in  the  unique  circumstances  surrounding 
the  ALCK  program.  Test  managers  planned  a  limited  IOT&E  program  to  evaluate 
unproven  technology  in  the  face  of  a  critical  need  for  the  system.  When 
technical  problems  crapped  up  in  testing,  decision  makers  bought  the  system 
anyway,  accepting  a  degree  of  uncertainty  in  effectiveness  and  suitability. 
Justified  or  not,  small  completion  percentages  like  these  have  an  effect  on  the 
evaluation.  Obviously,  it's  tough  to  evaluate  assessment  accuracy  without 
assessments.  In  this  example,  however,  OTEEK  still  demonstrated  its  worth. 
OTEEK  completion  percentages  highlighted  the  large  proportion  of  ALCK  IOT&E 
unknowns,  a  crucial  Insight  for  managers  reviewing  the  program.  The  FOT&E 
disposition  of  these  IOT&E  unknowns  is  another  important  issue. 

Bighty-nine  percent  of  the  ALCK  elements  rated  for  the  first  time  in  FOT&E 
were  deficient.  What  could  explain  a  large  percentage  of  IOT&E  unknowns 
turning  up  deficient  in  later  testing?  One  possibility  is  that  test  managers, 
realizing  the  impact  of  negative  OT&E  assessments  in  today’s  acquisition 
system,  want  an  air-tight  case  before  reporting  deficiencies.  If  a  degree  of 
uncertainty  exists,  some  test  managers  may  feel  the  "U”  is  safer  than  a 
qualified  ”D”  in  the  final  report.  Unfortunately,  such  a  practice  can  hide 
vital  information  from  the  decision  maker.  To  monitor  this  potential  problem 
area,  an  OTEEK  measure  showing  the  FOT&E  disposition  of  undetermined  IOT&E 
elements  would  be  a  valuable  addition  to  the  method.  OTEEK  leads  to  further 
insights  when  the  disagreements  between  IOT&E  and  FOT&B  ratings  are  analyzed. 

In  the  ALCK  program,  the  Time  of  Arrival  <TOa>  function  was  rated  satisfac¬ 
tory  in  I01&E,  but  undetermined  in  FOT&E.  Briefly,  the  TOA  function  is  a 
guidance  computer  routine  commanding  the  missile  to  speed  up  or  slow  down  in 
order  to  make  a  particular  timing  profile.  Klsslle  performance  and  aerody¬ 
namics  are  absolute  limits  on  the  TOA  capability.  The  IOT&E  and  FOT&B  test 
teams  evidently  disagreed  over  the  meaning  of  the  evaluation  objective: 
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"Evaluate  the  operational  capability  of  the  TQA  function  in  the  missile 
computer. ” (14: 30)  The  IOT&E  team  wanted  to  see  if  the  routine  worked  at  the 
anticipated  slow-down  or  speed-up  rate,  but  the  final  report  for  IOT&E  admits 
that  the  testing  of  this  function  was  limited.  <14: 31)  The  test  team  did  not 
induce  artificial  errors,  evaluating  only  naturally  occurring  timing  errors. 
These  errors  were  "small  and  did  not  tax  the  TOA  function’s  capability. ” <14: 31) 
TOA  testing  was  also  limited  by  conflicting  higher  priority  DT&E  objec¬ 
tives.  <23: — )  levertheless,  since  the  demonstrated  TOA  speed-up  and  slow-down 
was  close  to  the  expected  value,  the  IOT&E  team  rated  TOA  satisfactory.  The 
FOT&E  team  used  a  different  philosophy. 

During  FOT&E  Phase  One,  TOA  was  rated  "U"  because  the  team  felt  that 
although  the  function  was  correct,  TOA's  exact  capability  was  unknown.  They 
recommended  validation  and  analysis  of  the  mission  planning  factors  constrain¬ 
ing  TOA  performance.  <15: 38)  Until  testing  determined  the  limits  of  the 
capability  for  TOA,  the  FOT&E  team  did  not  feel  justified  in  giving  a  satisfac¬ 
tory  rating.  The  two  approaches  were  unquestionably  different.  IOT&E  person¬ 
nel  verified  the  TOA  function,  while  F0T4B  team  members  tried  to  determine  the 
TOA  capability.  Clearly,  this  disagreement  would  never  have  occurred  if  test 
objectives  had  been  carefully  written  with  no  ambiguities,  and  then  followed  to 
the  letter.  OTEEX  analysis  proved  valuable  by  highlighting  this  disagreement 
and  encouraging  closer  investigation.  IOT&E  and  FOT&E  reports  also  disagreed 
on  ALCX  availability. 

The  reports  rated  availability  satisfactory  in  IOT&E,  but  deficient  in 
FOT&E.  Again,  the  problem  lay  in  the  wording  and  interpretation  of  the  test 
•objective.  The  objective  was  concisely  written  as:  ’’Estimate  the  availability 
at  the  AGX-86B  weapon  system. " <14: 53)  According  to  the  IOT&E  report,  this 
meant  "apparent  availability. " <14: 55)  Apparent  availability  is  the  number  of 
missiles  apparently  available  to  support  an  Emergency  War  Order  generation  and 
does  not  Include  missiles  that  art  inoperative,  but  have  not  been  detect¬ 
ed.  <15:59)  Because  of  these  undetected,  inoperative  missiles,  the  FOTAE  team 
favored  real  availability  over  apparent  availability. 

The  primary  measure  of  AV  Cair  vehicle)  availability  is  termed  "real" 
availability.  Calculating  real  availability  takes  into  account 
missiles  not  mission  capable  because  of  undetected  failures,  such  as  in 
the  engine,  and  missiles  down  for  inspection  or  for  maintenance  caused 
'  by  detected  failures.  .  .  Real  availability  is  a  statistical  measure  of 

ithe  number  of  mission  capable  missiles  at  a  random  point  in 
time. <15:59) 

ring  IOT&E,  real  availability  was  Indeed  below  the  appropriate  threshold,  but 
in  the  words  of  the  IOT&E  report:  "Since  the  evaluation  criteria  were  based  on 
apparent  availability  of  a  mature  AGX-86B  system,  availability  of  the  AGX-86B 
was  satisfactory. " <14: 55)  In  the  IOT&E  report  executive  summary,  no  distinc¬ 
tion  was  made  between  apparent  and  real  availability:  "Availability,  logistics 
relfiabllity,  maintainability.  .  .  are  all  satisfactory. "  <14:  v)  Clearly,  this 
kind  of  misunderstanding  can  have  an  effect  on  decision  making.  Another 
problem  area  highlighted  by  the  ALCX  application  of  OTEEX  concerns  the  critical 
issue  dimension. 
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The  ALC3C  IOTAE  test  program  left  several  of  the  critical  Issues  unanswered. 
This  happened  despite  1979  DoD  direction  that  critical  issues  were: 

Those  aspects  of  a  systen’s  capability,  either  operational,  technical, 
or  other,  that  must  be  questioned  before  a  system's  overall  worth  can 
be  estimated,  and  that  are  of  primary  importance  to  the  decision 
authority  in  reaching  a  decision  to  allow  the  system  to  advance  into 
the  next  acquisition  phase. (16:22) 

lo  reason  was  given  in  the  report  for  leaving  these  questions  unanswered. 
Although  beyond  the  scope  of  this  paper,  a  rigorous  investigation  into  the 
reasons  for  the  unanswered  issues  could  result  in  valuable  lessons. 

The  final  ALCM  insight  from  OTEEM  analysis  concerns  deficiency  reporting. 
ALCM  experienced  an  alarming  364  percent  increase  in  critical,  mission-threat¬ 
ening  deficiencies  in  the  field.  Is  this  increase  normal,  or  was  ALCM  par¬ 
ticularly  immature  when  ordered  into  production?  Right  now,  it’s  impossible  to 
say.  Only  comparison  with  different  programs  will  allow  the  test  manager  to 
get  a  feel  for  what  is  normal.  Even  without  a  basis  of  comparison,  however,  it 
seems  reasonable  to  question  the  production  maturity  of  this  weapon. 

In  summary,  OTEEM  analysis  highlighted  ALCM  IOTAE  problems  in  completing 
element  ratings,  interpreting  test  objectives,  answering  critical  issues,  and 
detecting  deficiencies.  At  this  point,  it's  important  to  remember  that  today, 
ALCM  is  an  extremely  capable  weapon  system  with  a  front-line  role  in  deter¬ 
rence.  However,  this  is  not  to  say  that  the  test  program  could  not  have  stood 
some  improvement.  Although  OTEEM  successfully  uncovered  problem  areas  in  the 
ALCM  IOTAE  program,  OTAE  final  reports  will  need  some  improvements  if  OTEEM  is 
to  work.  These  suggested  improvements  and  other  miscellaneous  ideas  are 
presented  below. 


REPORT  IMPROVEMENTS  ; 

i 

Deficiency  Data  Should  Specify  Date  Vrltten 

It  was  difficult  to  determine  whether  FOTAE  deficiency  data  included  write¬ 
ups  dating  from  IOTAE.  The  report  should  specify  the  testing  phase  or  date 
when  each  deficiency  was  discovered. 

Pse  of  Thresholds 

For  many  elements,  test  teams  award  adjective  ratings  based  on  numerical 
thresholds.  Reports  should  clearly  specify  the  threshold  values  acceptable  in 
each  element.  If  the  thresholds  change  for  FOTAE,  the  fact  should  be  clearly 
highlighted.  A  brief  explanation  of  what's  "satisfactory"  or  "deficient," 
using  the  threshold  values,  will  help  keep  things  on  a  quantitative  basis  where 
possible. 
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Teat  Methodology 

The  final  report  should  explain  the  way  each  element  was  evaluated. 

Reports  should  contain  sufficient  detail  to  explain  itsagreements  like  real 
versus  apparent  availability.  OTEEX  can't  compare  „  ^ples  with  oranges.  Some 
reports  already  contain  this  type  of  information  in  sufficient  detail. 

Critical  Issues 

Final  reports  should  clearly  address  critical  issues  If  issues  are  left 
unanswered,  the  report  should  explain  why.  The  information  preserved  by  such  a 
practice  would  prove  invaluable  for  later  analysis.  The  current  situation 
requires  reading  between  the  linen  and  guesswork. 


XISCELLAS5QUS  CONCERNS 
Quantitative  versus  Qualitative 

The  approach  used  in  this  report  was  to  compare  qualitative  adjective 
ratings  based  on  quantitative  thresholds.  Another  approach  would  be  to  compare 
the  exact  numerical  value  for  each  element.  The  problem  would  be  how  to  lump 
30  miles  of  range  difference  with  200  feet  of  accuracy  difference  and  come  up 
with  some  usable  overall  rating  for  effectiveness  accuracy.  Perhaps  this 
quantitative  analysis  could  best  be  included  as  an  appendix  to  the  regular 
OTEEX  ratings. 

Effect  of  Changing  Xlsslons  and  Threats  . 

Weapons  systems  are  sometimes  used  for  unforeseen  missions  against  un¬ 
planned  threats.  An  example  is  the  use  of  the  high-altitude  3-52  bomber  for 
low-altitude  weapons  delivery.  Test  programs  should  not  be  expected  to 
anticipate  the  effect  of  completely  different  mission  roles  and  threats  after 
the  weapon  is  fielded.  To  avoid  the  impact  of  innovative  mission  roles,  the 
performance  data  used  to  Judge  the  effectiveness  of  IOT&E  should  be  collected 
early  in  the  operational  life  of  the  weapon.  Use  of  the  FOTAE  Phase  One  report 
fulfills  this  requirement.  If  other  data  is  used,  it  should  be  collected  no 
later  than  Initial  Operational  Capability  (IOC)  plus  two  years — a  commonly 
accepted  milestone  for  weapons  system  maturity. 

System  Improvements  Xasklng  Poor  Predictions 

Suppose  IOT&E  predicted  that  weapons  system  performance  in  a  particular 
area  would  be  satisfactory.  In  this  hypothetical  example,  subsequent  improve¬ 
ments,  unforeseen  at  the  time  IOT&E  was  conducted,  eventually  ensured  that  the 
predicted  performance  level  was  reached.  Without  the  unforeseen  improvements, 
the  system  would  not  have  reached  the  predicted  level.  In  this  case,  an  OTEEX 
comparison  of  predicted  versus  experienced  performance  would  not  highlight  the 
poor  IOT&E  prediction.  There  really  is  no  easy  solution  to  this  dilemma, 
except  to  note  that  weapons  system  improvements  are  natural  and  desirable.  If 
Improvements  happen  to  mask  a  poor  prediction,  at  least  the  weapons  system  is 
doing  the  Job  at  the  predicted  performance  level.  The  opposite  case,  a  system 
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performing  at  a  worse  level  than  predicted,  would  be  highlighted  by  OTEEM  and 
investigated. 

IQT&E/FQT&B  Gaming  the  System 

Anytime  evaluation  is  used,  someone  will  game  the  system  in  order  to  look 
good.  Gaming  OTEEM  would  be  easy.  IOT&E  personnel  could  simply  make  extremely 
conservative  estimates  in  the  different  elements  of  effectiveness  and  suitabil¬ 
ity.  Since  OTEEM  only  spots  FOT&E  elements  that  don’t  live  up  to  expectations, 
the  conservative  IOT&E  would  look  very  good.  Such  gaming  would  reduce  the 
utility  of  the  IOT&E  assessment  and  mislead  decision  makers.  Fortunately, 

OTEEM  gaming  is  unlikely  because  of  the  time  between  the  star.,  of  IOT&E  and  the 
completion  of  FOT&E.  Many  of  the  IOT&E  folks  would  have  moved  on  to  other  Jobs 
before  an  OTEEM  evaluation  could  ever  be  made  and  would  have  nothing  to  gain  by 
gaming  the  system,  nevertheless,  assessment  confidence  Intervals  might  be  used 
to  reduce  any  tendency  to  make  overly  conservative  estimates.  For  example: 
Operational  Range  =  lOOOnm  (plus  or  minus  lOOnm) .  OTEEM  could  be  changed  to 
highlight  any  result  outside  the  error  band. 

ICC  Plus  Two  Year  Assessment 


Sometimes  FOT&E  Phase  One  is  too  early  to  get  a  good  feel  for  system 
performance.  As  mentioned  above,  IOC  plus  two  years  is  accepted  as  a  general 
definition  of  system  maturity.  At  that  time,  the  operating  command  should  have 
a  wealth  of  experience  with  the  actual  operational  characteristics  of  the 
system.  Assembling- the  data  would  require  tracking  down  all  the  various 
offices  that  file  information  on  reliability,  system  accuracy,  etc.  '  This 
information  could  provide  the  most  valid  basis  for  OTEEM  comparison,  and  would 
be  a  valuable  addition  to  OTEEM.  It  could  even  be  used  to  evaluate  the 
effectiveness  of  FOT&E  Phase  One. 

Use  of  Actual  Milestone  III  Briefing  Materials 

Since  one  of  the  main  purposes  of  IOT&E  is  to  support  the  production 
decision,  the  actual  IOT&E  assessment  briefing  given  to  the  decision  makers 
would  be  of  interest.  Furthermore,  since  the  IOT&E  final  report  may  include 
information  gathered  after  the  production  decision,  it  may  not  represent  the 
actual  estimates  provided  in  the  IOT&E  milestone  III  assessment.  To  ensure 
this  valuable  information  isn’t  lost  in  the  shuffle,  milestone  III  IOT&E 
assessment  briefings  could  be  included  as  an  appendix  in  the  IOT&E  final 
report. 

Retrofit  Information 


A  weapon  requiring  a  large  number  of  retrofits  to  reach  effective  and 
suitable  performance  was  probably  immature  when  produced.  A  measure  showing 
how  many  retrofits  are  accomplished  between  milestone  III  and  IOC  plus  two 
years  might  also  be  a  good  addition  to  OTEEM.  This  measure  is  closely  related 
to  deficiency  reporting. 

This  chapter  analyzed  some  of  the  issues  highlighted  by  the  OTEEM  applica¬ 
tion  in  Chapter  Four.  Two  findings  result.  First,  OTEEM  is  clearly  capable  of 
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detecting  a  variety  of  IOTAB  problems  In  Individual  test  programs.  For 
example,  OTEEX  analysis  underscored  the  importance  of  clear  and  precise  test 
objectives.  Secondly,  some  final  report  improvements  are  necessary  to 
facilitate  OTEEM  use.  At  the  end  of  the  chapter,  general  concerns  were  raised 
addressing  topics  ranging  from  gaming  the  system  to  the  use  of  retrofit  data. 
Overall,  this  chapter  demonstrates  that  there  are  valuable  insights  to  be 
gained  through  the  application  of  OTEEX.  Chapter  Six  summarizes  this  report 
and  makes  recommendations. 


CHAPTER  SIX 


SUHXARY  AHD  RECOMMEHDAT I ON 

This  report  began  with  a  problem,  introduced  and  applied  a  soluticn,  and 
discussed  the  result.  A  short  review  of  each  chapter  brings  the  entire  report 
into  focus  and  provides  a  foundation  for  recommendation. 

In  Chapter  One,  the  problem  was  introduced.  Despite  supposedly  thorough 
testing,  there  is  ruch  debate  over  the  capabilities  of  new  weapons  systems. 

For  this  rerson,  critics  argue  that  weapons  testing  is  inadequate  or  ineffec¬ 
tive.  Their  argument  is  difficult  to  dispute,  since  the  Air  Force  is  operating 
its  OTAE  system  without  an  objective  feedback  method.  The  lack  of  a  feedback 
system  and  the  atmosphere  of  controversy  surrounding  acquisition  decisions  make 
it  particularly  hard  to  determine  what  problems  exist  in  OTAE  and  decide  if 
changes  are  worthwhile.  Since  effective  operational  testing  is  clearly  vital 
in  acquiring  effective  and  suitable  weaponry,  an  objective  evaluation  techni¬ 
que,  like  the  one  proposed  here  for  IOTAE,  is  needed  to  provide  this  feedback. 

Chapter  Two  showed  that  the  history  of  OTAE  is  characterized  by  frequent 
organizational  change  as  managers  searched  for  ways  to  procure  effective  and 
suitable  weapons.  However,  in  1970,-  'after  30  years  of  ineffective  changes,  the 
operational  testing  system  was  pronounced  a  failure.  The  tendency  for  repeti¬ 
tive  change  evident  in  OTAE  history  is  symptomatic  of  a  poorly  operating  system 
with  inadequate  feedback.  In  the  past,  wars  provided  sporadic  general  feedback 
for  weapon  testing  efforts,  but  never  allowed  managers  to  determine  the  exact 
problems.  A  pattern  of  spasmodic,  ineffective  organizational  change  was  the 
result,  and  will  be  the  resulc  unless  the  Air  Force  adopts  an  appropriate 
operational  testing  feedback  technique.  Chapters  One  and  Two  argue  that  OTEEM 
is  needed  to  break  this  pattern  and  efficiently,  objectively  diagnose  OTAE 
problems.  ^ 

The  next  two  chapters  introduced  and  applied  OTEEM.  Chapter  Three  laid  the 
foundation  needed  by  readers  unfamiliar  with  testing  and  evaluation.  An  \ 
understanding  of  the  terminology  and  philosophy  behind  present  day  OTAE  is  1 
crucial  to  an  appreciation  for  the  Operational  Test  Effectiveness  Evaluation  1 
Method.  In  Chapter  Four,  desired  characteristics  like  universal  applicability! 
were  discussed.  Then,  a  crude  version  of  OTEEM  was  introduced  and  applied  to  a 
real-world  operational  test  program  with  surprising  results.  I 

Chapter  Five  discussed  these  results.  For  example,  investigation  of 
factors  emphasized  by  OTEEM  analysis  showed  that  Interpretation  of  test  I 

objectives  was  a  problem  in  ALCM  I OTAE.  Additionally,  the  chapter  contained 
suggestions  to  make  reports  more  conducive  to  OTEEM  analysis.  Finally,  the 
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chapter  discussed  general  concerns  like  how  unanticipated  weapons  system 
improvements  might  mask  poor  IOT&E  predictions. 

After  ti:ase  five  chapters,  the  reader  should  recognize  that  the  absence  of 
an  objective  feedback  technique  has  contributed  to  the  acquisition  debate  and 
has  historically  handicapped  Air  Force  ability  to  Judge  the  effectiveness  of 
its  testing  system.  However,  using  data  available  today,  it  is  possible  to 
devise  an  evaluation  technique  based  on  OT&E  primary  goals  to  provide  this 
missing  objective  feedback.  The  benefits  offered  by  OTEEM  range  from  micro¬ 
scopic  post-mortems  of  specific  test  programs,  to  macroscopic  views  of  broad 
operational  testing  trends.  Given  the  importance  of  operational  testing,  some 
form  of  objective  evaluation  is  an  absolute  necessity. 


RECOMMENDATION 

Logically,  OTEEM  should  be  implemented  by  the  service  organization  already 
charged  with  IOT&E  management  or  oversight.  In  the  case  of  the  Air  Force,  the 
Air  Force  Operational  Test  and  Evaluation  Center  (AFCTEC)  is  the  obvious 
choice.  OSD’s  DOTE  may  also  be  interested  in  the  method. 

A  special  group  should  be  formed  at  AFOTEC  to  handle  OTEEM  affairs.  The 
group  should  begin  a  trial  program,  applying  OTEEM  to  selected  systems  that 
have  reached  IOC  plus  two  years.  After  this  study  is  completed,  a  finalized 
OTEEM  technique  should  be  established  and  implemented.  A  data  base  of  OTEEM 
results  could  be  then  generated  and  could  include  such  components  as  a  yearly 
OTEEM  report.  * 

For  decades,  operational  testing  managers  have  searched  for  the  key  to  OT&E 
success.  As  controversy  rages  over  increasingly  complex  and  expensive  weapons, 
managers  must  ensure  OT&E  is  as  effective  and  accurate  as  possible.  OTEEM  may 
finally  be  a  way  to  optimize  OT&E  methods,  silence  the  critics,  and  ultimately 
ensure  that  the  weapons  reaching  the  ramp  really  are  effective  and  suitable. 


30 


BIBLIOGRAPHY 


Books 

1.  Rasor,  Dina  (ed).  More  Bucks, _ Less  Bang:  How  the  Pentagon  Buys 

Ineffective  Veapons.  Washington,  DC:  Fund  for  Constitutional 
Government,  1983. 

2.  The  Readers*  Digest  Treasury  of  Modern  Quotations.  New  York:  Readers’ 

Digest  Press,  1975. 


Articles  and  Periodicals 

3.  Air  Force  Times.  21  September  1987,  p.  18. 

4.  Biddle,  Vayne,  ”How  Much  Bang  for  the  Buck?”  Discover  (September  198A), 

pp.  50-63. 

5.  Canan,  James  V.  "Testing  from  Chips  to  Chocks."  Air  Force  Magazine 

(February  1988),  pp  40-45. 

6.  Lerner,  Michael  A.,  with  John  Barry.  "Sergeant  York  Musters  Out." 

lews week  (9  September  1985),  p  23. 

7.  Morrison,  David  C.  "OTAE  Fails  to  Quiet  the  Critics."  Military  Logistics 

Forum  (June  1986),  pp.  43-46,  63. 

8.  "Pentagon  Veighs  Plan  to  Expand  Testing  Schedule  of  Veapons  Systems." 

Aviation  Veek  and  Space  Technology  (9  March  1987),  pp.  264-265. 

9.  Powell,  Stewart,  and  Melissa  Healy.  "The  8-1  Bomber:  A  Flying  Lemon.” 

0,S.  Hews  and  World  Report  (24  November  1986),  p.  29. 

10.  Van  Voorst,  Bruce,  "The  Pentagon’s  'Flying  Edsel.”’  Tire  (19  January 
1987),  p.  21. 


Official  Documents 

11.  Adams,  Ronald  M. ,  Maj,  DSAF.  Test  Concurrency  and  the  Carluccl  Initia¬ 

tives:  When  Is  More  Too  Much?  Maxwell  AFB,  AL,  1984. 

12.  Everly,  Kieth  V.,  MaJ,  DSAF.  United  States  Air  Force  Policy  for 

Operational  Test  and  Evaluation.  Maxwell  AFB,  AL,  1987. 

13.  Oertel,  Robert  E. ,  Maj,  DSAF.  Operational  Test  and  Evaluation:  The  Quest 

for  Independence.  Maxwell  AFB,  AL,  1985. 


31 


CONTINUED 


14.  US  Department  of  the  Air  Force:  Air  Force  Test  and  Evaluation  Center. 

"AGX-G6B  Initial  Operational  Test  and  Evaluation  Final  Report  on  ALCM 
Competition iu> . ”  Xirtland  AFB,  HK. ,  I960. 

15.  US  Department  of  the  Air  Force:  Air  Force  Test  erd  Evaluation  Center. 

"AGM-86B  Air  Launched  Cruise  Missile  Opera  1  trial  Test  and  Evaluation 
Final  Report (U) . "  Kirtland  AFB,  IK.,  1981. 

16.  US  Department  of  the  Air  Force:  Test  and  Evaluation.  \'c  Regulation  80- 

14.  Washington,  DC:  Government  Printing  Office,  1980. 

1?.  US  Department  of  the  Air  Force:  Test  and  Evaluation.  AF  Regulation  80- 
14.  Vashington,  DC:  Government  Printing  Of free,  1986 

18.  US  Department  of  the  Air  Force.  The  Tongue  and  Quill.  AF  Pamphlet  13-2. 
Washington,  DC:  Government  Printing  Office,  1986. 


Unpublished  Materials 

19.  US  Department  of  -the  Air  Force:  Air  Command  and  Staff  College.  "Thinking 

About  War:  A  Survey  of  Military  Theory,”  text  00033  R01  8503. 

Maxwell  AFB,  AL. 

20.  US  Department  of  the  Air  Force:  Extension  Course  Institute  (AU). 

"History  of  U.S.  Air  Power,  Course  50,  vol  1,"  text  00050  01  8406. 
Maxwell  AFB,  AL. 


Other  Sources 

21.  Feighery,  Col,  USAF.  OSD,  DOTE.  Telephone  conversation  in  October  1987. 

22.  LloyC,  Dr.  AFOTEC/RSH.  Telephone  conversation  on  19  January  1988. 

23.  Pulcher,  Larry  J.,  MaJ ,  USAF.  ACSC/EPT.  Conversation  in  January  1988 

recalling  his  personal  experiences  as  an  ALCM  IOT&E  test  team  member. 


32 


